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Abstract 



The ecosystem service concept is at the interface of ecology economics and politcs, with scientific 
results rapidly translated into management or political action. This emphasises the importance of 
reliable recommendations provided by scientist. We propose to use evidence-based practice in 
ecosystem service science in order to evaluate and improve the reliability of scientific statements. 
For this purpose, we introduce a level-of-evidence scale ranking study designs (e.g. review, 
case-control, descriptive) in combination with a study quality checklist. For illustration, the 
concept was directly applied to 12 case studies. We also review criticisms levered against 
evidence-based practice and how it applies to ecosystem services science. We further discuss who 
should use the evidence-based concept and suggest important next steps, with a focus on the 
development of guidelines for methods used in ecosystem service assessments. 

Ecosystem services, the benefits humans derive from nature, have gained popularity over the past 



ten years (Raf faelli and White[|20i3| ). The concept provides a common discussion ground in 
science-policy interaction ( [Daily et alj|2009| ). Beside the positive aspects of increasing popularity and 
public attention, it runs the risk to serve as a buzzword boosting scientifically weak studies 
( [Vihervaara et al. , 2010] ). To lend scientific credibility to the ecosystem services concept, we need to 
improve the scientific basis of ecosystem services, together with an increased awareness about the 



reliability of current results (Carpenter et al. 2009; Boyd 2013). 



It was medicine that pioneered the evidence-based concept assessing the reliability of scientific 
statements and encouraging practitioners (doctors) to use only the most solid recommendations 
( [Sackett et al. 1996 Cochrane Collaboration - www.cochrane.org). In evidence-based medicine, 



scientific results are ranked hierarchically according to their study design and quality ( [OCEBM Levels 



of Evidence Working Group, 2011 ). Such a scale permits the identification of the most reliable 



recommendation for diagnoses and treatments. 

New concepts entail evaluation, and evidence-based practice has not stayed without criticism. We 
discuss the central arguments raised against evidence-based practice. Despite this criticism, 
evidence-based practice is successfully implemented and applied in medicine, today. The concept is 
also mentioned in other areas, including justice ( www.campbellcollaboration.org ), economics ( [Reiss 



2004] ) and environmental science such as conservation ( [Pullin and K night) I2001 2009) [Sutherland et al. 



2004]) or forestry (Binkley and Menyailo||2005l|Petrokofsky et a/.lUoiib. 
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In environmental science the most relevant step towards an evidence-based practice were the 
introductions of the journals 'Conservation Evidence' in 2004 and 'Environmental Evidence' in 2011, 
by the Collaboration for Environmental Evidence ( www.environmentalevidence.org). The editors were 
the first to transfer evidence-based medicine to conservation (Pullin and Knight 2001 ). Discussions 



^rose about the hierarchy of study designs that should be used in environmental science. |Pullin and 



Knight (2001) and Petrokofsky et al. (2011) encouraged the use of a scale closely related to medicine, 



but this scale did not represent well the approaches normally used in environmental science. 
[Sutherland et al. ( 2004] ) argue that we cannot use a hierarchy at all because conservation, and 
environmental science more generally, is less straightforward and less well resourced than medicine. 
Nevertheless these authors agree that the top of the hierarchy, the gold standard, is represented by 
systematic reviews, and therefore the Collaboration for Environmental Evidence highly emphasises 
the generation of systematic reviews ( |Pullin andK nightj I2009} [Sutherland et al] I2004 ; Petrokofsky 



et fl/.||20i~i] ). Systematic reviews are not the only source of information for practitioners, scientists and 



policy makers and evidence-based practice involves tracking down the best available evidence with 
which to answer the question at hand (Sackett et al. 1996). 

Our aim is to propose a hierarchy and a quality checklist ranking the strength of evidence of 
common study designs in combination with quality criteria. These are valid for all environmental 
science studies. We further introduce evidence-based practice to ecosystem service science, which has 
not yet seen it in use. Scientists and decision makers should elucidate and transparently quantify the 
reliability of knowledge and thus the scientific basis for decisions taken. We give clear guidance on 
the terminology around evidence-based practice, to ensure that scientists and practitioner can 
communicate effectively across the disciplines and backgrounds. In the last section we provide 
examples for the application of the concept, respond to common criticism and offer suggestions for 
the next steps. 



The evidence-based concept 

The terminology used around evidence-based practice is diverse even in the medical field. However a 
well-defined terminology is essential for good communication between practitioners and scientists. 

According to the Oxford English Dictionary, evidence is the available body of facts or information 
indicating whether a belief or proposition is true or valid 

(www.oxforddictionaries.com/definition/english/evidence). In other words, evidence is a measure for the 
knowledge behind a statement. The strength of evidence reflects the quality of our knowledge and 
we can identify whether a statement is based on high or low evidence, hence very reliable or hardly 
reliable. Following this argumentation, evidence-based practice means that actions are undertaken 
based on knowledge of the highest available reliability. It further means that if high evidence results 
are missing, the end-user is aware about the low reliability of the statement. 

Evidence-based practice starts with a question or a purpose (Fig. 1). The way to the answer, i.e. to 
the outcome of the study, implies a study design. The study design is the set-up of the investigation, 
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e.g. controlled, correlative or observational. These study designs are not equally good, leading to 
different strengths of evidence. In order to derive a level of evidence, we need a hierarchical scale 
ranking study designs. Further the implementation of the design is important, and assessed in the 
critical appraisal. Study designs with a high level of evidence can be implemented poorly. We provide 
a quality checklist to derive the study quality further below. With help of the critical appraisal we 
determine the final level of evidence, depending on the study design as well as on quality criteria. 



1. 



2. 



Question/Purpose 



Evidence assessment 



2a. 

Study design 


A 

> 


Level of Evidence 


Critical 


appraisal 


2b. 

Quality criteria 




J Quality points 



Outcome/Statement 




ClO 

o 

Q. 
Q. 

to 



Final 
Level of Evidence 



Figure l. Schematic procedure in evidence-based practice: l. Identification of question/purpose of the study 
and the outcome/statement, given as result of the study. 2. The assessment of the evidence supporting the 
outcome, with help of a level-of-evidence pyramid (A) and a quality checklist (□). 



i. Question, outcome and the context 

As in all of science the purpose of the investigation, ideally in form of a question, has to be clear. Still, 
it is sometimes surprisingly challenging to ask a question correctly. For example, the question has to 
fulfil certain criteria to be a well-focused and must be an answerable question ([Higgins and Green 
20i i[ [Collaboration for Environmental Evidence| [2013] p. 20-23). For ecosystem service science, we 
suggest in addition to the question the specification of the environment and the context. The 
information which ecosystem service is investigated in which system is necessary to determine the 
context for the validity of the answer. Ecosystem service science is interdisciplinary and combines 
ecology, economy, politic and other social and natural sciences. In order to know which field we 
operate in, it is recommended to determine the facet of the ecosystem services question: 

1. Quantification of ecosystem services: the amount of an ecosystem service or a set of services. 
It can be measured in absolute units or relative to another system. 

2. Valuation of ecosystem services: the societal value of a service or a set of services. The most 
common way is monetary valuation. Other possibilities are in relation to a reference system or 
on a ranked scale (high, middle, low value). 

3. Management of ecosystem services: the management/treatment of an ecosystem to favour 
specific ecosystem services. For example leaving dead wood in forests to increase biodiversity or 
reducing agricultural fertiliser to decrease nearby lake eutrophication. 
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91 4. Governance of ecosystem services: the strategy to steer a management type. The tools used are 

92 either incentives (subsidiaries) or penalties (law/tax). 

93 Ideally these facets are investigated in the presented order starting with the quantification of an 

94 ecosystem service, which should then be valued. The most valuable services will be favoured by a 

95 well-adapted management option and in the end a governance strategy of how to steer the preferred 

96 type of management is implemented. Deviations of this structure are common, e.g. valuation does not 

97 necessarily require prior quantification. However, to cover the whole width of ecosystem service 

98 science, all four steps are required. 

99 We have highlighted the question, context and facet. In an ecosystem services study, this is 

00 followed by the actual investigation. The outcome is usually the result of the study, it is the answer to 

01 the originally formulated question. 



02 2. Evidence assessment 

03 The outcome of an investigation can be of high or low reliability depending what was done to achieve 

04 the answer. The evidence assessment investigates the study design and the quality in order to 

05 determine the reliability of the outcome. In the following we present an evidence assessment not only 

06 for ecosystem service science, but also for all other environmental sciences. 



07 



Level-of-evidence pyramid 



08 At the heart of evidence-based practice lies the hierarchy to rank the study designs (Fig. 2). The study 

09 design determines whether it yields high or low evidence. Systematic reviews (LoEia) are at the top 
end of the level-of-evidence scale and provide the most reliable information. They summarise all 
information gained in several individual studies and are conducted according to strict guidelines (e.g. 



20 



21 



22 



23 



24 



25 



Collaboration for Environmental Evidence 2013). Ideally they include quantitative measures, at best a 



meta-analysis (in the strict sense; see |Borenstein et al. 2009[|Vetter et a/.[|20i3| ). Other more 



conventional reviews (LoEib) may also include quantitative analysis or be purely qualitative. They 
both summarise the findings of several studies, but conventional reviews are less complete, not 
reproducible and often suffer more from publication bias. 

The necessary condition for any review is that appropriate individual studies are available. The 
most reliable individual studies are studies with a reference (L0E2). Typically, these are 
case-control or before-after control-impact studies. Method comparison can be useful for the 
valuation of ecosystem services, where no 'true' reference exists, however the results between both 
methods have to be consistent to provide high evidence. 

Uncontrolled correlative and regressive studies (L0E3) are studies investigating for examples 
the influence of environmental variables on the quantity of an ecosystem service. Descriptive 
studies, also called observational studies (L0E4) present the data collected, sometimes in 
summary statistics or ordinations or they feed into simulation models. They are based on data, but not 
26 conducted in a controlled or correlative design. 



4 
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128 

129 

130 
131 



The lowest level of evidence are statements that are not based on any data (L0E5). These are 
usually anecdotes or expert opinions, the latter ones often not better than random (Tetlock J2005I ). 
Even if their argumentation is a mechanism-based reasoning ('first principles': A works according to a 
certain mechanism, so we expect B to work in the same way), we cannot rely on these statements in 



the context of ecosystem services, where no first principles exist (Lawton 1999). 



/ 



r7 



4> 



Review 

a Systematic 
b Conventional R) 



Studies with a reference 

a Case-control 

Before-after control-impact 
b Method comparison 
with consistent results 



/ 



Correlative & regressive studies 



Descriptive/observational studies 

e.g. - Surveys without control/correlative analysis 
- Simulation models with data input 



Studies not based on data 

Expert opinion 
Mechanism-based reasoning 



Figure 2. Level-of-evidence (LoE) pyramid ranking study designs according to their evidence. L0E1 - L0E5 with 
subcategories a and b. 



132 It is important to note that 'method 5 and 'design' should not be confused. Methods are the means 

133 used to collect or analyse data, e.g. remote sensing, questionnaires, ordination techniques, model 

134 types. The design reflects how the study was planed and conducted, e.g. a case-control or descriptive 

135 design. For some methods, the underlying design is not easy to identify. Remote sensing for example 

1 3 6 can be done purely descriptive or with a valid reference such as ground-truthing or in a 'before-after' 

137 design. Most methods used in a descriptive design could actually follow a controlled design, but not 

138 necessarily do so. 

139 Critical appraisal 

140 The critical appraisal assesses the quality of the implementation of a study design. A study with a high 
mi evidence design may be poorly conducted. The critical appraisal identifies the study and reporting 

142 quality. It may lead to a correction of the level of evidence, so that the final level of evidence 



5 
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supporting the outcome is lower than the one allocated according to the design. This depends on 
objective, sometimes design- or facet-specific criteria. Several literature sources provide lists with 
quality criteria (e.g. Rychetnik et al 2001 ; Pullin and Knight} 2003; Soderqvist and Soutukorva 2006 



Balshem et al. 



2011 



Oxford Centre for Evidence-based medicine 2011 www.cebm.net). We combined 
these lists to a general quality checklist (Box 1). The checklist consists of 33 questions with the 
possibility to use only a subset if some questions are not appropriate for the specific context. All 
questions answered with yes receive one point (or two points if it is an important questions - in bold 
font in Box 1), and zero points if answered with no. In case of non-reported issues, we advice the 
answer 'no' to indicate a deficient reporting quality. The percentage of points received out of possible 
points will help to decide whether to downgrade the level of evidence. 



> 75% of total points 
50 - 75% of total points 
25 - 50% of total points 
< 25% of total points 



> no shortcomings 

> shortcomings 

> serious shortcomings 

> very serious shortcomings 



-> no downgrading 
-> downgrading by one level 
-> downgrading by two levels 
-> downgrading by three levels 



For example, if the first 17 questions of the checklist (Box 1) were answered, 10 of them - including 
the 3 bold ones - with 'yes' and 7 with 'no'. 13 out of 20 points (65%) were reached. 65% means that 
there are shortcomings and it is suggested to downgrade the study by one level of evidence. 

We encourage the use of the checklist for an orientation, but we want to emphasise that this 
procedure can not be fully standardised. Quality aspects can also depend on the context of the study 
and the final judgement will remain with the user. Reviews provide information on the highest level 
of evidence and the critical appraisal is different from other designs, because they themselves are 
based on studies with lower evidence (see Box 1: section review). If only studies based on low 
evidence were included, the quality assessment should downgrade a review to L0E4 and if in addition 
other quality issues showed serious shortcomings even to L0E5. 



Application of the evidence-based concept 

The most popular application of the evidence-based concept is a systematic review that is used to 
summarise all knowledge available for a specific question. A systematic review is however time 
consuming and if policy makers need a specific answer in a shorter time, a 'rapid evidence assessment' 
(|UK Civilservice||20i3| ) can be used as an alternative to a systematic review. Another approach to 
evidence-based practice are synopses. Synopses do not focus on a specific question but bring together 
information from a much broader topic, e.g. from a whole animal class, such as amphibians ( [Smith ~ 



and Sutherland) [20T4). A third possibility to use the evidence-based concept are guidelines to 



recommend tools/methods based on the best available evidence. These 'best practice guides' will focus 
on methods and the questions are therefore less typical systematic review questions, e.g. 'How much 
C0 2 is stored in European temperate forests?', but more like 'Which is the best method to measure 
C0 2 stored in temperate forests?' This serves to allow forest scientists to employ the best method to 
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175 
176 

177 
178 

179 
180 
181 



any temperate forest. In the case of evidence-based ecosystem service science that would also identify 
the evidence base of common instruments and tools, e.g. INVEST dTallis and Polaskyl|2009 ). All these 



possibilities for the application of the evidence-based concept summarise individual studies and 
therefore require the evaluation of the evidence of individual studies included. In systematic reviews 
this is typically done as a step in the critical appraisal, but so far a scale and a clear guideline was 
missing. With the method described above we can assess the level of evidence of individual studies 
and in the following we provide several examples (more details in the supplement table Si and S2). 



182 

i8 3 
184 
i8 5 

186 

187 
188 
189 
190 
191 
192 

193 
194 

195 
196 

197 
198 

199 
200 

201 
202 
203 



Examples of evidence-based practice 

'How does adding dead wood influence the provision of ecosystem services? 5 was a question 



addressed by Acuna et al ( 2013) . They investigated two ecosystem services (food (fish) and retention 
of organic and inorganic matter) in a river-forest ecosystem in Spain and Portugal and studied the 
effect of a management intervention. Their study design followed a before-after control-impact 
approach, which is L0E2. The critical appraisal (see supplement table S2) revealed shortcomings: only 
14 out of 24 points (58%) were gained. The level of evidence was downgraded by one level to level 
three. We therefore conclude that the statement made by Acuna et al.\\20i^) : 'restoration of natural 
wood loading in streams increases the ecosystem service provision' is based on L0E3. In addition they 
valued the ecosystem services, which is a subquestion of the study ('What is the value of ecosystem 
services provided by streams?'). It can also be assessed for their evidence, which is especially 
important to guarantee multiple lines of evidence. 



A second example is the governance-related question by Entenmann and Schmitt (2013): 'Do 



stakeholders relate REDD+ to biodiversity conservation?' They found that synergies between REDD+ 
and biodiversity conservation were assumed by stakeholders. It is an observational design (L0E4), 
receiving only 10 of 20 quality points and therefore downgraded to L0E5. 



The third example was a systematic review of Bowler et al (2010 ), conducted according to the 



guidelines of the Collaboration for Environmental Evidence (2013). They investigated the effect of 



greening urban areas on the air temperature to mitigate heat exposure, a management-related 
question. They found that green space in an urban area is on average i°C cooler, than a built-up site. 
According to the quality assessment the study achieved 24 out of 26 points (92%) and it therefore 
remained on the originally assigned highest LoEia. 



204 



Common criticisms 



205 Evidence-based practice (EBP) has faced criticism that we do not want to ignore. In the following, we 

206 discuss the most common arguments raised in evidence-based medicine and conservation ( [Straus and 
McAlister 2qoo[ |Mullen and Streiner 2004[|Adams and Sandbrook 



201 



208 1. Cookbook problem 
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EBP is a cookbook approach denigrating professional expertise and replacing it with manualized 
procedures. Best practice guidelines can not replace expertise of practitioners and best practice 
recommendations will highly profit of additional expertise, determining whether the evidence is 



applicable to a particular problem, bearing in mind unique circumstances (Mullen and Streiner 2004). 
2. EBP ignores individual variability 



EBP oversimplifies complex relations and denigrates individual variability (Sackett et al^lippgflfemsfem 



and Horwitz, 1997: Straus and McAlister 2000: Gabbay and May 2004: Mullen and Streiner 2004). 



Individual variability may overhelm general patterns, making predictions useless. However, 
decision-making requires the identification of general patterns to predict an outcome. Predictions 
based on highest available evidence provide a higher probability to reach the desired outcome and are 
therefore better than any unproven alternative (Mull enand Streiner} [2004) ). 
3. EPB ignores qualitative data 



EBP was accused to neglect qualitative data, such as local and indigenous knowledge {Adams and 
Sandb rooQ \2013]) . Quantitative data allow for more sensitive statistical testing and provide more 



information than categorical knowledge. However, qualitative data are much better than none at all 
and can add valuable information (Sa le et al] 2002). As[Hadd away and Pullin| ( [20i3[ ) point out: all 
evidence counts. All information contribute to systematic reviews to ascertain completeness. 
4. No evidence that EBP works 



There is insufficient evidence that EBP works better than conventional approaches (Mullen and Streiner 



2004). EBP emerged from conventional practice over many years. Hence, there is no easy distinction 



between 'the conventional approach' and the evidence-based concept. Studies based on controlled or 
descriptive designs are sound scientific practice for centuries, and evidence-based research only 
emphasises to identify them as such. Still, we agree that the same rigour of reasoning should be 
applied, at a meta-level, to the concept of evidence, too. To date, too few data seem to exist to compare 
evidence-based decision-making with its more conventional cousin. 
5. Environmental science is too complex for EBP 

EBP works in medicine, but can not work in environmental science, because the socio-ecological 
system is more complex than a human body ( [Adams and Sandbrook|[20i3| ). Complexity is not, in 
itself, a reason to abandon evidence. While certainly the medical research field is different from 
environmental studies, few physicians would agree that it is less complex. More importantly, however, 
the medical professional has typically hundreds to thousands of cases to learn from over a lifetime, 
while conservation ecologists work on only a very few cases. Thus, the setting for learning from 
experience is very different and would actually demand a more evidence-based approach to the more 



complex system (Gilovich et al. 2002). 



6. Time and resources demanding 

EBP requires a long time to conduct a systematic review. While in general true, this argument is 



misleading (Straus and McAlister 2000). As soon as a database with systematic reviews and 



best-practice guidelines exists (see e.g. the Cochrane Collaboration and the Collaboration for 
Environmental Evidence), practitioners take less time to find an answer to their question than before. 



8 



Downloaded from http://biorxiv.org/ on September 18, 2014 



248 



249 



250 



251 



252 



There is further criticisms specifically addressing meta-analyses and its methodological 
implementation (Thompson and Pocock 1991 ; Bateman and Jones [2003]). We will not elaborate on 
methodological details, but understand that it is crucial to properly conduct and interpret 



meta-analysis results and refer to (Borenstein et al. 2009 ch.43) for a detailed discussion of these 
aspects. 



253 



Relevance for different user groups 



255 



256 



259 



260 



In the previous section we have elaborated how to employ the evidence-based concept. Now we want 
to provide a few notes on who should use it: 

1. Scientists conducting their own studies have to be aware how to achieve the highest possible 

257 evidence, particularly during the planning phase. Choosing a study design that provides a good 

258 evidence and respects quality aspects will substantially increase the potential contribution to our 
knowledge. 

2. Scientists advising decision-makers should be aware of the evidence of information they 

26! include in their recommendations. Weighting all scientific information equally, or subjectively, runs 

262 the risk of overconfidence and bias. 

263 3. Decision-makers receiving information from scientists should demand a level-of-evidence 

264 statement for the information provided, or should judge themselves the reliability having in mind the 

265 evidence-based concept. 
4. We further would like to encourage consortia, international panels and learned societies, 
such as the Intergovernmental Platform on Biodiversity & Ecosystem Services (IPBES), EU 
projects or Ecological Societies (BES, ESA, INTECOL) to develop guidelines with 
recommendations on methods to best quantify, value, manage or govern a desired ecosystem service 
or bundle of services. This would give decision-makers a toolbox, making the common procedure 
('decision-makers seeking advice from individual scientists') superfluous. These 'best practice guides' 
ideally exist for every single and for the sum of ecosystem services in every facet and in every 
ecosystem. For example we may want to ask what is the best way to quantify recreation, to value 
recreation, to manage recreation and to use governance strategies that fosters sustainable recreation in 
a temperate forest. Each best practice guide would clearly state its level of evidence. At a higher level, 

276 where the sum of all ecosystem services in one ecosystem need to be evaluated, it would make sense to 

277 have a best practice guide on how to measure, say, the total (economic) value (e.g. summing individual 
values up with a strategy to avoid double-counting ( |Boyd and Banzhaf]|2007 ; DEF RA]|2007| )). All this 



266 
267 
268 
269 
270 

271 
272 

273 
274 
275 



278 



279 may sound unrealistic, given the huge number of methods, ecosystem services, management and 

280 governance options and so forth. However, in medicine, national and international learned societies 

281 set up assessment and guideline boards for exactly this purpose (often with governmental support, e.g. 

282 the UK's National Institute for Health and Care Excellence (NICE) www.nice.org.uk or Germany's 

283 IQWiG www.Iqwig.de). There are currently 261 recognised diseases with over 12000 sub-categories 

284 (ICD-10). This is certainly at the same scale as the challenges faced by ecosystem service science. 
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Conclusion 

We introduced the evidence-based concept in ecosystem service science, encompassing a scale to 
judge the available evidence and a quality checklist to facilitate critical appraisal. We further showed 
in detail and illustrated with examples how to use the concept. Additional support and guidance can 
be obtained by the Collaboration of Environmental Evidence (www.environmentalevidence.org). 

The evidence-based ecosystem service science does not suggest a specific management strategy. It 
is by no mean a contradiction or replacement to adaptive management or other management 
concepts. Rather, it complements these approaches, emphasising that whatever is used should be used 
with the awareness of how approved our knowledge is. 

Wrong decisions can have strong negative consequences. This is particulary painful, if studies 
providing high evidence were available, but instead decisions were based on myth or low evidence 
studies. Taking again an example from medicine, child mortality from sudden child death was 
unnecessary high for decades due to wrong recommendations based on low evidence, ignoring the 



higher evidence available ( [Gilbert et a/.||2005 ). Especially on topics whith various and contradicting 



opinions, it is important to continuously summarise and update the available evidence. If farmers 
Jiave no reliable information on the management of natural pest control versus pesticides ( |Wright 



et fl/.||20i3] ), their actions may result in huge and avoidable economic loss or even directly affect 
human health. 

It should have become clear that evidence-based ecosystem service science concerns scientists as 
well as decision-makers and the general public. In the interest of a responsible use of environmental 
resources and processes, we strongly encourage embracing evidence-based practice as paradigm for 
all research contributing to ecosystem service. 
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Box 1. The quality checklist is designed in form of questions. Each question answered with 'y es ' will receive a 
point, important aspects (bold type) two points. If a question is not appropriate in the specific context, it may be 
left out. 



1 Correspondence (Does the question match the answer?) 

2 Are the assumption used in the study reasonable? 

3 Internal validity: Do design and implementation avoid a high risk of bias? 

4 External validity/relevance: Is the result transferable to other scenarios with the same 
context? 

5 Are multiple lines of evidence considered? 

6 Was the target population/area defined in space, time and size? 

7 Was a sampling population defined? (Which population/area/ecosystem was sam- 
pled?) 

8 Were potential differences between the target population and the sampling population 

0 considered? 

1 MM 

•I— I 

% 9 Were the methods described in sufficient detail to permit replication? 

5 io Was the sample size appropriate? 

£ 1 1 Was probability sampling used for constructing the sample? 

^ 12 If secondary data are used, did an evaluation of the original data collection take place? 



on 
'en 
>> 

03 



13 Is the choice of statistical/analytical method justified and comprehensively explained? 

14 Are variables and statistical measures given ? 
Jj 15 Was accuracy/uncertainty assessed and reported? 

c/5 16 Are results consistent and homogeneous? 

3 17 Magnitude of effect: Is the effect large (and without large uncertainty)? 

c£ 18 Attrition bias: Are non-response/drop-outs given and is their impact discussed? 

Design-dependent aspects: 

19 Is there a low probability of publication bias? E.g. results reporting a negative relation- 
ship were probably not included 

20 Is the review based on high evidence individual studies (several level 2 or level 
3 studies)? 

21 Validity - Do the studies included respond to the same question? 
.2 22 Was the literature searched in a systematic way? 

# 23 Was a meta-analysis (in the strict sense: see Borenstein et a/.|(f2009)) included? 

a - 1 1 1 1 

24 Were any other quantitative summary statistics provided? 

25 Selection bias: Was the assignment of case-control groups randomized? 

•r ^ 26 Were groups designed equally, aside from the investigated point of interest? 

c/5 £ 27 Performance bias: Was the sampling blinded, e.g. researchers taking samples of a spe- 



cific area wouldn't know which differences are between these areas? 
£ 28 Were there sufficient replicates of treatment and reference groups? 
Facet-dependent aspects: 

Were future values of ecosystem services considered? 

If future values were considered, were they discounted with a well-motivated discount 
rate? 

If aggregate economic values for a population were estimated, was this estimation 
consistent with the sampling procedure and the definition of the population. 
If valuation took place in form of a questionnaire, was the study pre-tested and piloted? 
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